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' Abstract 



We introduce the concept of knowledge states; many well-known algorithms can be viewed 
as knowledge state algorithms. The knowledge state approach can be used to to construct 
competitive randomized online algorithms and study the tradeoff between competitiveness and 
memory. A knowledge state simply states conditional obligations of an adversary, by fixing a 
^vq | work function, and gives a distribution for the algorithm. When a knowledge state algorithm 

receives a request, it then calculates one or more "subsequent" knowledge states, together with 
a probability of transition to each. The algorithm then uses randomization to select one of those 
subsequents to be the new knowledge state. We apply the method to the paging problem. We 
present optimally competitive algorithm for paging for the cases where the cache sizes are k = 2 



C*3 ■ and k = 3. These algorithms use only a very limited number of bookmarks. 
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! 1 Motivation and Background 

In this paper we introduce a new method for constructing randomized online algorithms, which we 
call the knowledge state model. The purpose of this method is the address the trade-off between 
memory and competitiveness. The model is introduced and fully described for the first time in this 
publication, but we note that a number of published algorithms are implicitly consistent with the 
model although not in its full power. For example, the algorithm EQUITABLE [Ij is a knowledge 
state algorithm for the fc-cache problem that achieves the optimal randomized competitiveness of 
.Hfc for each k, using only 0{k 2 log k) memory, as opposed to the prior algorithm, PARTITION 
that uses the full information contained in the work function, and hence requires unlimited 
memory as the length of the request sequence grows. At the other end of the scale, the randomized 
algorithm RANDOM_SLACK [10] is in fact an extremely simple knowledge state algorithm, which 
achieves randomized 2-competitiveness for the 2-server problem for all metric spaces, and which 
achieves randomized /c-competitiveness for the fc-server problem on some spaces, including trees. 
We also note that RANDOM_SLACK is trackless and is an order 1 knowledge state algorithm, 
i.e., its distribution is supported by only one state. (See the recent ACM SIGACT column [5] for 
a summary of tracklessness; see also [31 IU El IS]-) We also note that we have recently used the 
knowledege state technique to develop an optimally competitive algorithm for the caching problem 
in shared memory multiprocessor systems [6]. 
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It is still an open question, whether there exists an optimally competitive order 0(k) bookmark 
randomized algorithm for the fc-cache problem. An affirmative answer to this question would 
settle an open problem listed in [7j. In this paper we describe progress on this question. We 
give an order 2 knowledge state algorithm which is provably ^-competitive. Since an equivalent 
behavioral algorithm must keep one "bookmark," namely the address of an ejected page, it is not 
an improvement over our earlier result [3], but it does illustrate the knowledge state technique in a 
simple way. We then give an order 3 knowledge state algorithm which is provably ^-competitive, 
which is an improvement, in terms of memory requirements, over EQUITABLE for the case k = 3 
(Section HJ). 

We also consider the problem of breaking the 2-competitive barrier for the randomized com- 
petitiveness of the 2-server problem, a goal which has, as yet, been achieved only in special cases 
(Section [5]). For the class of uniform spaces, this barrier was broken by PARTITION [11]. For the 
line, a ^-competitive algorithm was given by Bartal et al. [2J. 

In this paper we give a formal description of the knowledge state method. It is defined using the 
mixed model of online computation, which is described in Section [2j This section relates the mixed 
model to the standard models of online computation, and explains how a behavioral algorithm can 
be derived from a mixed model description. Section [3] defines the knowledge state method (in terms 
of the mixed model) and shows how potentials can be used to derive the competitive ratio of a 
knowledge state algorithm. Even though the concepts in Section [2] and [3] are natural and intuitive 
some of the formal arguments to prove our method are somewhat involved. In Section U]the method 
is applied to the paging problem; two optimally competitive algorithms are presented. We discuss 
ongoing experimental work for the server problem in Section 

2 The Mixed Model of Online Computation 

We will introduce a new model of randomized online computation which is a generalization of 
both the classic behavioral and distributional models. We assume that we are given an online 
problem with states X (also called configurations), a fixed start state x° € X, and a requests 1Z. 
If the current state is x € X and a request r € 1Z is given, an algorithm for the problem must 
service the request by choosing a new state y and paying a cost, which we denote cost(x,r,y). It 
is convenient to assume that there is a "distance" function d on X, and it is possible to choose 
to move from state x to state y at cost d(x, y) at any time, given no request. We will assume 
that d(x, x) = and d(x, z) < d(x, y) + d(y, z) for any states x, y, z. It follows that cost(u, r, v) < 
d(u, x) + cost(x, r, y) + d(y, v) for any states u, x, y, v and request r. Formally in this paper we refer 
to an online problem as an ordered triple V = (X,lZ,d). Examples of online problems satisfying 
these conditions abound, such as the server problem, the cache problem, etc.. 

Given a request sequence g = r 1 , . . . r n , an algorithm must choose a sequence of states x 1 , . . . x n , 
the service. The cost of this service is defined to be X^™=i cosi(x <_1 , r*, x l ). An offline algorithm 
knows g before choosing the service sequence, while an online algorithm must choose x without 
knowledge of the future requests. We will assume that there is an optimal offline algorithm, opt, 
which computes an optimal service sequence for any given request sequence. As is customary we 
say that a deterministic online algorithm A is C -competitive for a given number C if there exists a 
constant K (not dependent on g) such that cost^g) < C ■ cost opt (g) + K for any request sequence 
g. Similarly, we say that a randomized online algorithm A is C-competitive for a given number C 
if there exists a constant K (not dependent on g) such that E{cost_A{g)) < C ■ cost opt {g) + K for 
any request sequence g, where E denotes expected value. 

In order to make the description of various models of randomized online computation more 
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precise, we introduce the following notation. Let II be the set of all finite distributions on X. If 
7r £ II and S C X, we say that S supports the distribution tt if tt(S) = 1. The distributional support 
(or "support" for short) of any 7r E IT is defined to be the unique minimal set which supports tt. 
By an abuse of notation, if the support of tt is a singleton {x}, we write tt = x. 

An instance of the transportation problem is a weighted directed bipartite graph with distribu- 
tions on both parts. Formally, an instance is an ordered quintuple (A, B, cost, a, (3) where A and B 
are finite non-empty sets, a is a distribution on A, (3 is a distribution on B, and cost is a real- valued 
function on A x B. A solution to this instance is a distribution 7 on A x such that 

1. 7({a} x B) = a(a) for all a € A 

2. 7(^4 x {b}) = (3(b) for all b £ B. 

Then cost(j) = Yla&A S&eB l( a > b)cost(a, b), and 7 is a minimal solution if costfa) is minimized 
over all solutions, in which case we call cost(j) the minimum transportation cost. 

There are three standard models of randomized online algorithms (see, for example [ZJ). We 
introduce a new model in this paper, which we call the mixed model. Those three standard models 
are: distribution of deterministic online algorithms, the behavioral model, and the distributional 
model. We very briefly describe the three standard models. 

Distribution of Deterministic Online Algorithms. In this model, A is a random variable 
whose value is a deterministic online algorithm. If the random variable has a finite distribution, we 
say that A is barely random. 

Behavioral Online Algorithms. In this model A uses randomization at each step to pick the 
next configuration. We assume that A has memory. Let A4 be the set of all possible memory states 
of A. We define a full state of A to be an ordered pair k = (x, m) E X X M.. Let m° G M. be the 
initial memory state, and let m* be the memory state of A after servicing the first t requests. 

Then A uses randomization to compute k l = (x t ,m ), the full state after t steps, given only 
k 1 ^ 1 and r*. A behavioral algorithm can then be thought of as a function on X x A4 x TZ whose 
values are random variables in X x Ad. 

Distributional Online Algorithms. If tt, it' G n, let S be the support of tt and S' be the support 
of tt' . We then define d(ir, tt') to be the minimum transportation cost of the transportation problem 
(S, S' , d, tt, tt'), and if r G TZ, we define cost(TT, r, tt') to be the minimum transportation cost of the 
transportation problem (S,S' ,cost r ,tt,tt'), where cosf = cost( ,r, ) : X x X — > R. 
A distributional online algorithm A is then defined as follows. 

1. There is a set A4 of memory states of A. There is a start memory state m° G A4. 

2. A full state of A is a pair k = (vr,m) G n x A4. The initial full state is k° = (-7r ,m°), where 

TT° = 8°. 

3. For any given full state k = (tt, m) and request r, A deterministically computes a new full 
state k' = (tt' ,m'), using only the inputs tt, m, and r. We write A(tt, m, r) = (tt' ,m') or 
alternatively A(k, r) = k! . Thus, A is a function from IlxA^xT^tonx A4. 

4. Given any input sequence g = r 1 . . . r n , A computes a sequence of full states A(g) = k 1 , . . . k n , 
following the rule that k l = (tt ,m t ) = A(k t ~ 1 ,r t ) for all t > 1. Define cost A (g) = 
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We note that a distributional online algorithm, despite being a model for a randomized online 
algorithm, is in fact deterministic, in the sense that the full states are computed determinis- 
tically. 

The following theorem is well-known. (It is, for example, implicit in Chapter 6 of [7].) 
Theorem 1 All three of the above models of randomized online algorithms are equivalent, in the 
following sense. If A± is an algorithm of one of the models, there exist algorithms Ai, A3, of each 
of the other models, such that, given any request sequence g, the cost (or expected cost) of each Ai 
for g is no greater than the cost (or expected cost) of A\. 

The Mixed Model. The mixed model of randomized algorithms is a generalization of both the 
behavioral model and the distributional model. A mixed online algorithm chooses a distribution at 
each step, but, as opposed to a distributional algorithm, which must make that choice determinis- 
tically, can use randomization to choose the distribution. 

A mixed online algorithm A for an online problem V = (X, 1Z, d) is defined as follows. As 
before, let II be the set of finite distributions on X . 

1. There is a set Ai of memory states of A. There is a start memory state m° G Ai. 

2. A full state of A is a pair k = (it, m) € II x Ai. The initial full state is k° = (-7r ,m°), where 

7T° = S°. 

3. For any given full state k = (it, m) and request r, there exists a finite set of full states k x , . . . k m 
and probabilities X 1 . . . A m , where YOiLi ^ = 1> such that if the current full state is k and the 
next request is r, A uses randomization to compute a new full state k' = (%', m'), by selecting 
k! = ki for some i. The probability that A selects each given ki is Aj. We call the {ki} the 
subsequents and the {Aj} the weights of the subsequents, for the request r from the full state 
k. 

A is a function on II x Ai x 1Z whose values are random variables in II x Ai. We can write 
A(7T,m,r) = ("71"', m'). Alternatively, we write A(k,r) = k' . For fixed k and r; k',n', and m! 
can be regarded as random variables. 

4. Given any input sequence g = r ...r n , A computes a sequence of full states A{g) = 
{it 1 ,™ 1 ) . . . (7r n ,m n ), following the rule that k l = (vr^m*) = A(k t ^ 1 ,r t ) for all t > 1. Note 
that, for all t > 0, k l , it 1 , and m l are random variables. 

Computing the cost of a step of a mixed model online algorithm A is somewhat tricky. We note 
that it might seem that YLiLi \cost(ir,r,iTi) would be that cost; however, this is an overestimate. 

Without loss of generality, A is sensible. Let k = (it, m) £ IT x M and let r € TZ. Let S C X be 
the support of it. Let {ki = (iTi,mi)} be the subsequents and {Aj} the weights of the subsequents, 
for the request r from the full state k. Let S C X be the union of the supports of the {vrj}. Define 
7f = YllLi Note that ff G II, and its support is S. Define cost_A(k,r) = cost(ir,r,Tt). 

Finally, if g = r 1 . . . r n is the input request sequence, and the sequence of full states of A is 
k 1 . . . k n , we define cost^(g) = Ylt=i cost^k 1 " 1 ,r t ). 

We now prove that the mixed model for randomized online algorithms is equivalent to the three 
standard models. 

Lemma 1 If A is a mixed online algorithm, there is a behavioral online algorithm A 1 such that, 
for any request sequence g, E(cost^/(g)) = E(cost^(g)). 

Proof: A memory state of A will be a full state of A, i.e., we could write Ai C II x Ai. By a slight 
abuse of notation, we also define a full state of A to be an ordered triple (x, it, rn) S X x II x Ai 
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such that (it, m) is a full state of A and ir(x) > 0. Intuitively, A keeps track of its true state x G X, 
while remembering the full state (it, m) of an emulation of A. 

For clarity of the proof, we introduce more complex notation for some of the quantities defined 
earlier. Let n,a G II, m,n G A4, and r G 1Z. If (vr,m) is ci full state of define A-^ m r a n 

to be 

the probability that A(tt, m, r) = (cr,n), i.e., the conditional probability that A chooses (a, n) to 
be the next full state, given that the current full state is (tt, m) and the request is r. We assume 
that there can be at most finitely many choices of (a, n) for which \n,m,r,<r,n > 0. In case (vr,m) 
is not a full state of A, then \n,m,r,<r,n ls defined to be zero. If (tt, to) is a full state of A and 
r G 7Z, write Tf nim ,r = 52 a en,neM ^,m,r,a,n ■ cr G II, and choose a finite distribution on 
X x X which is a minimal solution to the transportation problem (X,X, cosf ,TT,Tr n ,m,r), where 
cosf(x,y) = cost(x,r,y). Thus tt(x) = J2 y ex Tvr.m.rO, y) for iGA"; ^, m , r (y) = E^e* Ttt,™,^, y) 
for i/e-Y; cosU(vr,m,r) = E^eA^e* 77r,m,r (x, y)cost(x, r, y). 

We now formally describe the action of the behavioral algorithm A. The initial full state of A 
is (x°,k°) = (x°, 7T°, m°). Given that the full state of A is (x,ir,m) and the next request is r G 1Z, 
and given any (y,a,n) £ X x U x M, we define h x ,-K,m.,r,y,c,ni the probability that „4 chooses the 
next full state to be (y,a,n), as follows: 

If ^Tr,m,r{y) = 0, then h. x ^,m,r,y,o,n = 0. 

Otherwise, A xmy , a , n = ^j^ggg^ • 

Let g be a given request sequence. We now prove that E(cost^(g)^ = E(cost_^(g)). For any 
t > and any knowledge state (-7T, to) of A, define p'(vr, to) to be the probability that the full state 
of A is (ir,m) after t steps. Additionally, if x G X, define q t (x,TT,m) to be the probability that the 
full state of A is (x, tt, to) after t steps. 

To prove the lemma we consider first the following two claims: 

1. For any t > 0, x G X, tt G II, and to, G M, q l (x, tt, to) = p*(7r, to) ■ tt(x). 

2. For any t > 0, tt G II, and m G .M, J^eA 7r ' m ) = -P*! 71 "' m )- 

We prove claims 1 and 2 by simultaneous induction on t. If t = 0, both claims are trivial by 
definition. Now, suppose t > 0. We verify claim 1 for t. By the inductive hypothesis, claim 2 holds 
for t — 1. Write r = r*. Let y, <r, n G ^ x LT x Ai. If (<r, n) is not a full state of A or c(y) = 0, we 
are done. Otherwise, recall that Tt^-m-riy) = X^eA l^,m,r(x, y) for all y G X, and we obtain 

<7*(y,cr,«) = X! ?* _1 (a;,7I",H A x,7r,m,r,i/,a,n 

(i,ir,m)£^xIlxA1 



Ep* (7T, ?Tl)7r(x) • 

(x,7T,m)eA'xnx.M,7 r (x)>0,7f 7t m r (y)>0 V ; *,™>r\yj 



77r,m,r(a;, y) • ct(t/) ■ A^™^ 



{x,-K,m)eX-xny.M^^, m Av)> a ^,m,r(y) 

(7r,m)enxM,if 7r , m , r (y)>0 V ^,m,r{y) ^ 

,m,r,<7,n 

(7T,m)enxX,« x>m , r (j/)>0 

CT (y) ' £ P* _1 ( 7r ' m ) ' <W,m,r,<r,n = cr(y) • J3* 0, ™) 
(jr,m)eIIxM 
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which verifies claim 1 for t. Claim 2 for t follows trivially. 

For the conclusion of the lemma, let t > 0, and let r = r t . We use claim 1 for t — 1. Recall that 
^■K,m,r = So-en neM ^( 7r ' m ' r > a ' n ) ' a ^ or an y s t a t e C 71 ") m ) of A. Then 

E'(cOS^) = q'^ 1 (x, TT, m) ■ A Xt „,m,r,y,a,n ' CO.st{x,r,y) 

7r , (7(^11, m,nGAi,x.y ^ X 

Et — lr \ / \ lTT,m,r{x, y) ' ' \r,m,r,tr,n ,i \ 

P (tt, m) ■ tt{x) cost(x,r,y) 
wix) ■ TT V m r (y) 

7t(x)>0,t(b)>0 

= E ( P^ 1 ^' 171 ) ' 77T,m,r(^,y) • COSt(x,r,y) ■ ^ K,m,r,a,n <? V I 

= ^2 m) ■ 7ir,mA x ' V) " cost(x, r, y) 

TreTI,m£M,x,yeX 

= E P t_1 ( 7r : m )' E 7w,m,r(x,y) ■ cost(x,r,y) 

Tr£ll,meM \ x,y£X 



P* 1 (7r,m) • cosU(7r,r,7r OTim , r ) 
p*^ 1 (7r, m) • cos^(7r, to, r) = _E(cosi^) 



7ren,meAi 
and we are done. □ 

Theorem 2 7/^4 zs a mixed model online algorithm for an online problem V , there exist algorithms 
A\, A2, and A3 for V, of each of the standard models, such that, given any request sequence g, the 
cost (or expected cost) of each A% for g is no greater than the cost (or expected cost) of A. 

Proof: From Lemma [1] and Theorem [TJ □ 

Corollary 1 If there is a C -competitive mixed model online algorithm for an online problem V , 
there is a C -competitive online algorithm for V for each of the three standard models of randomized 
online algorithms. 



3 Knowledge State Algorithms 

We say that a function lo : X — > R is Lipschitz if Lo(y) < uj(x) + d(x,y) for all x,y G X. An 
estimator is a non-negative Lipschitz function X — > R. If S C X, we say that S supports an 
estimator to if, for any y G X there exists some x G S such that uj(y) = uj(x) + d(x,y). If to is 
supported by a finite set, then there is a unique minimal set S which supports to, which we call the 
estimator support of lo. (We use the term "supporf instead of "estimator support" if the context 
excludes ambiguity.) We note that all estimators considered in this paper have finite support. We 
say that an estimator to has zero minimum if min^g^ uj(x) = 0. The next lemma allows us to 
compare estimators by examining finitely many values. 

Lemma 2 Suppose lo and lo' are estimators, and S is the support of lo. Then lo(x) > lo'(x) for all 
x G X if and only if uo(y) > Lo'(y) for all y G S. 

Proof: One direction of the proof is trivial. Suppose lo(x) < lo'(x) and Lo(y) > to'(y) for all y G S. 
Then there exists y G S such that lo{x) = oo{y) + d{y,x). It follows that to{y) = uo{x) — d(y,x) < 
lo'(x) — d{y,x) < Lo'(y), contradiction. □ 
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An example of an estimator is the work function of a request sequence. If x, y G X, we write 
costo pt (x, y) to denote the minimal cost of servicing the request sequence g starting at configuration 
x and ending at configuration y. Then, if g is a request sequence, the work function uj b : X — > R 
is defined by u e (x) = cost opt (s°, g, x). If g is a request sequence, the offset function is defined to be 
ui e = u) e — cost opt (g), a zero minimum estimator. If to is an estimator and if r G 1Z is a request, we 
define function wAr as (u)Ar){y) = min x ^x {u>(x) + cost(x,r,y)}. We call "A" the update operator. 
The following lemma allows us to compute the update in finitely many steps. 
Lemma 3 If uj is supported by S, then (ujAr)(y) = mm x£ s {^(x) + cost(x,r,y)} . 
Proof: Trivially, (uiAr)(y) < mm xl =s W(x) + cost(x,r,y)}. Pick z G X such that (ujAr)(y) = 
oj(z) + cost(z, r, y). Pick x G S such that uj(z) = uj{x) + d(x, z). Then 



and we are done. □ 

We note that it is easy to verify that uiAr is also an estimator. We briefly note the following 
lemma, which is well-known (see, for example, [8]). 



Lemma 4 If g = r 1 . . . r n , let g l = r 1 . . . r* for all t < n. Then uj°(x) = d(s°, x) for all x G X and 
uie*) = u) {e t ' 1 )^ t for all t > 0. 



We use estimators and adjustments to analyze the competitiveness of an online algorithm A. More 
specifically, the combination of estimators and adjustments allows us to estimate the optimal cost. 
An online algorithm does not know the optimal offline algorithm's cost at any given time, but 
can keep track of the estimator, and use it as a guide. The estimator is a real-valued function on 
configurations that is updated at every step, and which estimates the cost of the optimal offline 
algorithm, while the adjustment is a real number that is computed at every step. Both the estimator 
and the adjustment may be calculated using randomization. 

A knowledge state algorithm is a mixed online algorithm that computes an adjustment and an 
estimator at each step, and uses the current estimator as its memory state. More formally, if A is 
a knowledge-state algorithm, then: 

1. At any given step, the full state of A is a pair (it,uj), where tt G II and uj : X — > R is the 
current estimator. We call that pair the current knowledge state. 

2. If k = (it, uj) is the knowledge state and the next request is r, then A computes an adjust- 
ment, a number which we call adjust^(k, r), and uses randomization to pick a new knowledge 
state k! = (it',lu'). More precisely, there are subsequent knowledge states ki = (7rj,Wj) and 
subsequent weights \ for i = 1, . . . m such that 

(a) (uAr)(x) > adjust^(k,r) + YlT^i ^i^ii 30 ) f° r each x G X. 

(b) For each i, A chooses k' to be ki with probability Aj. 

(c) Let 7f = YliLi Define cost_^(k, r) = cost(ir, r, it). (As defined in the previous section 
in terms of the transportation problem) 

3. Finally, if g = r 1 . . . r n is the input request sequence, and the sequence of full states of A is 
k 1 . . . k n , where k l = (ir t ,oj t ), we define 



(ojAr)(y) 



= u)(z) + cost(z, r, y) = u)(x) + d(x, z) + cost(z, r, y) 
> u)(x) + cost(x,r,y) > (u)Ar)(y) 



cost^(g) 




) and adjust^g) = adjust ^{k 1 ,r*) 



n 



ri 




t=l 



t=l 
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If S C X, we say that a knowledge state (ir, tu) is supported as a knowledge state by S if uj is 
supported by S (in the estimator sense) and ir is supported (distributionally) by S. Note that, in 
this case, (71", uS) can be represented by the finite set of triples {(x, ir(x), uj(x))} x( - s . We say that 
a knowledge state algorithm has finite support if there is a uniform bound on the cardinality of 
the supports of the knowledge states. This bound is also called the order of the knowledge state 
algorithm. 

We say that A is C -competitive as a knowledge state algorithm if there is a constant K such 
that E(cost A (g)) < C ■ E(adjust A (g) + co n (x)) + K for any request sequence g = r 1 . . .r n and any 
x € X. 

Lemma 5 Given a request sequence g = r 1 . . .r n , then for all x £ X 

E{uj n {x) + adjust A (g)) < cost g opt {s° , x) 

Proof: Let s° = x°, sc 1 , . . . a;™ = x £ X be the optimal service of g that ends in x. Thus: 
Y,t=i cos^x^V*,^) = cost opt {s°,x). By ([21]): E{u t (x t ) + adjust A (g)) < ^((w t ~ 1 Ar t )(x*)) for 
all t. By definition: i^((a/~ 1 Ar*) (x*)) < -E(o/ _1 (x* -1 )) + cosi(x* _1 ,r t , x*) for all t. Summing the 
inequalities over all t, and adding to the equation, we obtain the result. □ 

Lemma 6 If a knowledge state algorithm A is C -competitive as a knowledge state algorithm, then 
A is C- competitive. 

Proof: Let K be the constant given in the definition of C-competitiveness for a knowledge state 
algorithm. Let g = r 1 . . . r n be any request sequence, and let s° = X . X . . . . X G X be the optimal 
service of g. Since A is C-competitive as a knowledge state algorithm: 

E{cost A {g)) < C-E(adjust A {g)) + C-E(uj n {x n ))) + K 
E (adjust A (g) + uj n {x n )) < cost opt (g) (by lemma [5]) 
We obtain: E(cost A (g)) < C ■ cost opt (g) + K 

□ 

We now define a C-knowledge state potential (C-ks-potential, for short) for a given knowledge 
state algorithm A. Let & A be a real-valued function on knowledge states. Then we say that & A is 
a C-ks-potential for A if 

1. $A(k) > for any k. 

2. If A; = (7r,u;) is the current knowledge state and r is the next request, {ki = (^,0;^)} are the 
subsequents of that request, and {A,} are the weights of the subsequents, let AQ A (k,r) = 
Y^Ll **®A(ni,Ui) ~ $a(k,w)- Then 

cost A (k, r) + A<& A (k, r) < C ■ adjust A (k,r). 

Theorem 3 If a knowledge state algorithm A has a C-ks-potential, then A is C-competitive. 

Proof: The proof follows easily from the definition of a C-ks-potential and Lemmas [5] and [6] by 
straightforward arguments. Let g = r 1 ....... r n be a request sequence. Let k 1 , . . .k n be the 

sequence of knowledge states of A given the input g, where k l = (-7r*,u/). Let & A = ^(A;*), a 
random variable for each t. Note that $> A is a constant. Let A l ^ A = AQ^k 1-1 , r f ). Note that 
E(A t ^ A ) = E(<& A — ^ t A 1 )- Let x € X be the configuration of the optimal algorithm after n steps. 
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Then 



C ■ cost opt {g) - E(costA(gj) > 

C ■ E{u) n {x) + adjust A {g)) - E(cost A (gj) = 

C ■ E i^ n {x) + adjust A (e) \ ~ E (it C0S ^)J = 

e(c ■ Lu n (x) +J2( C - adjust A {g) - cost A (g))\ = 

E[C-uj n (x) + ^ A + ^(C-adju S t A (g)-co S 1 l A (g)-A t ^ A )) - <S>° A > 



1=1 



E{C ■u n {x) + <$> n A )-<$> A > 



The first inequality above is from Lemma [5j The last two inequalities are from the definition of a 
C-ks-potential. It follows that E(cost A (g)) < C ■ cost opt (g) + and, by Lemma [HJ we are done. 

□ 

We can define a forgiveness online algorithm to be a knowledge state algorithm with the special 
restriction that there is always exactly one subsequent. We note that historically, forgiveness came 
first, so we can think of the knowledge state approach as being a generalization of forgiveness. 
A forgiveness algorithm can be deterministic, such as EQUIPOISE, a deterministic online 11- 
competitive algorithm for the 3-server problem (that was the best known competitiveness for that 
problem at that time), or distributional, such as EQUITABLE, an i^fc-competitive distributional 
online algorithm for the /c-cache problem. (See [H|9].) 



4 Knowledge State Algorithms for the Cache Problem 

We now consider the A>cache problem for fixed k > 2. The A;-cache problem reduces to online 
optimization, as defined in Section [2] of this paper, as follows: 

1. There is a set of pages. 

2. X is the set of all fc-tuples of distinct pages. If the configuration of an algorithm is x £ X , 
that means that the pages that constitute x are in the cache. 

3. The initial configuration is the initial cache. 

4. If x, y £ X, then d(x, y) is the cost of changing the cache from x to y. Since we assume that 
it costs 1 to eject a page and bring in a new page, d(x, y) is the cardinality of the set x — y. 

5. 1Z is simply the set of all pages. If a page r is requested, it means that the algorithm must 
ensure that r is in the cache at some point as it moves between configurations. Thus, for any 
x, y £ X and any r £ 1Z, we have 

{2 if x — y, r $ x 
d(x, y) if r G x or r £ y 
d(x, y) + 1 otherwise 

To complete the reduction, we observe that the support of any configuration request pair (x, r) is 
finite. If r £ x, that support has only one element, namely x, while otherwise, it has k elements, 
namely {x — a + r \ a £ x}. 
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Bar Notation for the Cache Problem. We introduce a convenient notation, a modification 
of the bar notation of Koutsoupias and Papadimitriou for offset functions for the fc-cache 
problem, which we call the bar notation^ Let a be a string consisting of at least k page names and 
exactly k bars, with the condition that at least i page names are to the left of the 2 th bar. Then a 
defines an offset function u as follows. Let S C X be the set of all configurations x such that, for 
each i = 1, . . . k, the names of at least i members of x are written to the left of the i th bar. Let uj be 
the estimator such that S is the support of uj, and such that uo{x) = for each x £ S. For example 
for k = 2, ab\\ denotes the estimator whose support consists of just the configuration {a, b}, and 
which takes the value zero on that configuration. For k = 4, a6||cd|e/| denotes the estimator whose 
support consists of the configurations {a, b, c, d}, {a, b, c, e}, {a, b, c, /}, {a, b, d, e}, and {a, b, d, /}, 
and which takes the value zero on those configurations. From we have: 

Lemma 7 A function uj is an offset function for the k-cache problem if and only if it can be 
expressed using the bar notation. 

4.1 A |-Competitive Knowledge State Algorithm for the 2-Cache Problem 

Recall that PARTITION (introduced in [TT]) is optimally competitive for the /c-cache problem, 
but uses unbounded memory to achieve the optimal competitiveness of H^. The memory state of 
PARTITION is, in fact, the classic offset function, which, in the worst case, requires keeping track 
of every past request. We now show how the use of knowledge states simplifies the definition, and 
in fact the memory requirement, of an optimally competitive randomized algorithm for the 2-cache 
problem, which we call Ki- 



BE a 




Figure 1: Schematic for the 2-Cache Knowledge State Algorithm 

Knowledge States of K2. We will follow the rule that, at each step, the adjustment is as large 
as possible, so that the minimum of the estimator will always be zero. This guarantees that any 
potential will always be non-negative. If there are infinitely many pages, K2 has infinitely many 
knowledge states, but, up to symmetry, it has only two. Each such knowledge state of K2 is 
supported by a set of cardinality at most 2, hence has at most three active pages, and therefore its 
equivalent behavioral algorithm has at most one bookmark. 

In the definitions given below, we say that two pages to are equivalent for a given knowledge 
state if they can be transposed without changing the knowledge state. 

1. If a, b are pages, let A a,b = ({a,b},ab\\). In this case, a and b are equivalent, i.e., A a,b = A b,a . 

2. If a,b,c are pages, let B a,b,c = (^{a, b} + \{a, c}, a\bc\) , where ^{a,b} + |{a, c} denotes the 
distribution which is ^ on the configuration {a,b} and \ on the configuration {a, c}. In this 
case b and c are equivalent, i.e., B a,b,c = B a,c,b . 

lr The notation of [11] differs slightly from that given here, although it is based on the same concept. 
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We list below the action of Ki- In each case, a, b, c, d are distinct pages. 

1. If {a, b} is the initial cache, the initial knowledge state is A a ' b . 

2. If the current knowledge state is A a ' b then 

(i) if the request is a, the new knowledge state is A a,b . 

(ii) if the request is c, then the new knowledge state is B c ' a,b . 

3. If the current knowledge state is B a,b,c then 

(i) if the new request is a, the new knowledge state is B a,b ' c . 

(ii) if the new request is b, the new knowledge state is A b ' a . 

(hi) if the new request is d {a, b, c}, then there are three subsequents, namely A d,a , A d,b , A d,c . 
The distribution on the subsequents is uniform, i.e., each is chosen with probability |. 

Actions [2j] and [3i] are requests to the hrst block of pages, in the sense of the bar notation. Since 
the bar notation implies that each page in the first block can be assumed to be in the cache, such 
a request is ignored by any sensible online algorithm, which means, in our case, that the estimator 
is unchanged and the adjustment is zero. We call such requests trivial. 

We define a potential 3> by $(A a > b ) = and <5>(B a > b > c ) = §. 

Lemma 8 $ is a ^-ks-potential for K2. 

Proof: Let k be the current knowledge state and r the new request. Write A<I> for increase in 
potential in the given step. We will show that 

cost + A$ < ^adjust (1) 

in all cases. In trivial actions, namely Cases [21] and [3T1 cost = A<3> = adjust, and we are done. 
We first note that: 

ab\\/\c = c\ab\ + 1 
a\bc\Ab = ab\\ 

a\bc\Ad > |do|| + ±db\\ + ±<fc|| + | 

By Lemma [21 the last inequality need only be verified for configurations in {{d,a},{d,b},{d,c}}, 
the support set of a|6c||Ad. 

Case Action l2nl In this case k = A a,b and r is a new page, c. 

a6||Ac = c\ab\ + 1. thus adjust = 1. Since the algorithm must bring in a new page, and since 
the probability is zero that the minimum transport brings in any other page, cost = 1. A$ = i, 
and we are done. 

Case Action [3nJ, i.e., k = B a,b,c and r = b. 

Recall a\bc\f\b = ab\\. Note that adjust = 0, since, as functions, ab\\ > a\bc\ on the set of 
all configurations, cost = 5, since the probability is ^ that the algorithm does nothing, and the 
probability is \ that it ejects c and brings in b. A<i> = — |, and we are done. 
Case Action [3ml . i.e., k = B a,b,c and r is a new page, d. 

Recall a\bc\/\d > ^da\ \ + ^db\ \ + ^dc\ \ + |, thus adjust = |. Since the algorithm must bring in a 
new page, and since the probability is zero that the minimum transport brings in any other page, 
cost = 1. A<E> = — I , and we are done. 

This completes the proof of all cases. □ 

We have: 
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Corollary 2 K2 is ^-competitive. 

We note that the number of active pages, i.e., pages contained in a support configuration, is 
never more than three. The number three is minimal, as given by the theorem below: 
Theorem 4 There is no knowledge state algorithm for the 2-cache problem that is ^-competitive as 
a knowledge state algorithm, and which never has more than two active pages, i.e., no bookmarks. 
Proof: If a knowledge state algorithm for the 2-cache problem never has more than two active pages, 
then it can have no bookmarks, hence is trackless. By Theorem 2 of [3], there is no |-competitive 
trackless online algorithm for the 2-cache problem. □ 

4.2 An Optimally Competitive Knowledge State Algorithm for the 3-Cache 
Problem 

We define a knowledge state algorithm K% which is i^-competitive for the 3-cache problem. Recall 
that = ^j-. Up to symmetry, K3 has six knowledge states. The number of active pages, i.e., 
pages contained in a support configuration, is never more than five. 

The knowledge states of K3 will be defined as follows. As in the case of K2, We say that two 
pages are equivalent if they can be transposed without changing the knowledge state. 

1. A a,b,c = ({a,b,c},abc\\\) for any three pages a,b,c. The pages a, b, and c are all equivalent, 
i.e., A a ^ c = A b ' a ' c = A a > c > b , etc. 

2. B a ' b,c,d = (|{a, b, c} + 4{a, 6, d} + ^{a,c,d},a\bcd\\) for any four pages a,b,c,d. The pages 
b, c, and d are all equivalent. 

g (ja,b,c,d _ (I{ a; 5 )C } _|_ ^{a,b,d},ab\\cd) for any four pages a,b,c,d. The pages a and b are 
equivalent, and c and d are equivalent. 

4. D a < b ' c > d > e = {\{a,b,c} + \{a,b,d} + \{a,b,e} + \{a,c,d} + \{a,c,e} + \{ade},a\bcde\\) for 
any five pages a, b, c, d, e. The pages b, c, d, e are equivalent. 

5 j^a,b,c,d,e _ 5 ? c j. _|_ ^{a, 6, <i} + ^{ct, 6, e}, a6| |c<ie|) for any five pages a,b,c,d,e. The 

pages a and b are equivalent, and d and e are equivalent. 

6. F a > b < c ' d ' e = (\{a,b,c] + \{a,b,d} + |{a,&,e} + |{a,c,d} + \{a,c,e] ,a\bc\de\) for any five 
pages a, b, c, d, e. The pages b and c are equivalent, and d and e are equivalent. 

The actions are of K% are formally defined below. In each case, a, b, c, d, e, f are distinct pages. 
We do not need to consider separate cases for requests to pages which are equivalent. 

1. If {a, b, c} is the initial cache, the initial knowledge state is A a ' b ' c . 

2. If the current knowledge state is A a ' b,c then 

(i) if the new request is a, the new knowledge state is A a,b ' c . 

(ii) if the new request is some page d £ {a, b, c}, the new knowledge state is B d,a,b ' c . 

3. If the current knowledge state is B a,b,c ' d then 

(i) if the new request is a, the new knowledge state is B a,b ' c ' d . 

(ii) if the new request is b, the new knowledge state is C a ' b,c,d . 

(iii) if new request is some page e ^ {a, b, c, d}, the new knowledge state is £) e > a ' 6 > c > rf . 
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Figure 2: Schematic for the 3-Cache Knowledge State Algorithm 

4. If the current knowledge state is C a ' b,c,d then 

(i) if the new request is a, the new knowledge state is C a,b ' c ' d . 

(ii) if the new request is c, the new knowledge state is A a ' b,c . 

(iii) if the new request is some page e ^ {a, b, c, d}, the new knowledge state is F e ' a ' b ^ d . 

5. If the current knowledge state is D a ' b,c,d ' e then 

(i) if the new request is a, the new knowledge state is D a > b > c 4,e_ 

(ii) if the new request is b, the new knowledge state is E a ' b ' c,d,e 

(iii) if the new request is some page / ^ {a, b, c, d, e}, then the new knowledge state is chosen 
uniformly from among the following ten knowledge states: A abc , A abd , A abe , A ab J ', A acd , 

A ace^ A acf^ A ade ^ A adf ^ and A aef _ 

6. If the current knowledge state is E a,b,c ' d ' e then 

(i) if the new request is a, the new knowledge state is E a ' b ' c,d,e . 

(ii) if the new request is c, the new knowledge state is A a ' b ' c . 

(iii) if the new request is d, the new knowledge state is A a ' b ' d . 

(iv) if the new request is some page / ^ {a, b, c, d, e}, then the new knowledge state is A-f' a > b . 

7. If the current knowledge state is F a > b > c > d > e then 

(i) if the new request is a, the new knowledge state is F a ' b ' c < d ' e . 

(ii) if the new request is b, the new knowledge state is E a,b,c ' d ' e . 

(iii) if the new request is d, the new knowledge state is C a,d,b,c . 

(iv) if the new request is some page / ^ {a, b, c, d, e}, the new knowledge state is chosen 
uniformly from among the following six knowledge states: C^' a ' b ' c , C^' b ' a ' c , C^' c ' a ' b , C^' a ' d ' e , 
Cf' b ' d ' e , and Cf' c ' d ' e . 
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We define a potential on the knowledge states as follows: <&(A a,b > c ) = 0, &(B ajb ' c ' d ) = |, 
$(C a ^' c ' d ) = ±, $(D a > b ' c > d > e ) = |, $(FA c >^ e ) = 1, and $(F a > 6 > c ' d ' e ) = |. 

Lemma 9 & is an ^--ks-potential for K3. 

Proof: For each action of K3, let A<1> be the increase in potential. We will show that 

cost + A$ < ^ adjust (2) 

In each case, the value of A<J> can be computed by simple subtraction. We need only compute the 
values of cost and adjust for each action, after which the inequality ([2]) follows by simple arithmetic. 

Case Actions [211 EH El E3 EH ED . These actions are trivial, and thus adjust = cost = A<I> = 0, and 
we are done. 

Case Actions l2ii l I3iiil I4iiit . In these actions, the request is to a new page, and the probability that 
any other page is in the cache after the action does not increase: thus cost = 1. We also know that 
adjust = 1 because 

abc\\\Ad = d\abc\\ + l 
a\bcd\\Ae = e\abcd\\ + 1 
ab\\cd\Ae = e\ab\cd\ + 1 

The remainder of the verification of ([2]) for each of those actions consists of simple arithmetic. 
Case Actions l6iil and feiiil . Note that adjust = since 

afc|cde||Ac = abc\\\ 
ab\cde\\/\d = abd\\\ 

In each case, we must keep a and b and eject the other two unrequested pages. The probability is 
^ that c is in our cache, and \ that d is in our cache, thus cost = \ for Action [6nl and cost = | for 
Action [6ml Since A<1? = — ^ for both actions, we are done. 

Case I5iil and I7iil . Note that adjust = since 

a\bcde\\/\b = ab\cde\\ 
a\bc\de\/\b = ab\cde\\ 

For Action [5nl recall that the distribution of D a ' b,c,d,e is uniform on six configurations. To compute 
cost, we describe a minimal transport between the distribution of £) a > b > c > d ' e an d the distribution of 
E a > b > c ' d ' e . That transport is defined as follows: 

If the previous configuration is {a,b,c}, {a,b,d}, or {a, 6, e}, do nothing. 

If the previous configuration is {a,c,d}, eject d. 

If the previous configuration is {a, c, e}, eject e. 

If the previous configuration is {a, d, e}, eject d with probability ^, and eject e with probability ^. 

Thus, cost = \- It is a routine verification that the required distribution for E a,b,c,d,e is achieved. 
Since A<I> = — |, we have verified ([2]) for Action l5nl 

For Action [Till recall that the distribution of F a < b > c ' d ' e is I on {a,b,c}, and is | on each of 
{a,b,d}, {a,b, e}, {a, c, d}, and {a,c,e}. A minimal transport can be defined as follows: if b is 
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already in the cache we do nothing, while otherwise, we eject c. Thus, cost = -r. It is a routine 
verification that the required distribution for E a,b,c,d,e is achieved. Since A<J> = — i, we have verified 
dU) for Action Em 

Case Actions I3ii( I4iil I7iiil . Note that adjust = since 

a\bcd\\Ab = ab\cd\\ 
ab\\cd\Ac = abc\\\ 
a\bc\de\/\d = ad\\bc\ 

For Action [3nl recall that the distribution of B a,b ' c ' d is uniform on {a, b, c}, {a, b, d}, and {a, c, d}. 
If b is already in the cache we do nothing, while otherwise, we eject c with probability ^ and eject 
d with probability ^. Thus, cost = ^. It is a routine verification that the required distribution for 
(ja,b,c,d - ls grieved. Since A<I> = — w, we have verified (|2j) for Action l3iu 

For Action [iHl recall that the distribution of C a,b ' c,d is uniform on {a, 6, c} and {a, 6, d}. If c 
is already in the cache we do nothing, while otherwise, we eject d. Thus, cost = ^. The resulting 
distribution is concentrated at {a, b, c}, as required for the knowledge state A a,b ' c . Since A<I> = — i, 
we have verified ([2]) for Action I4iil 

For Action [7ml recall that the distribution of F a,b,c,d,e is \ on {a,b,c}, and is | on each of 
{a, b, d}, {a, b, e}, {a, c, d}, and {a, c, e}. If d is already in the cache we do nothing. If e is in the 
cache, we eject e. Otherwise, the cache must be {a,b,c}, in which case we eject b or c with equal 
probability. Thus, cost = |. It is a routine verification that the required distribution for C a,d,b,c is 
achieved. Since A<3? = — |, we have verified (|2]) for Action [7ml 

Case Action lorvl . Note that adjust > since ab\\cde\Af = f\ab\cde\ + 1 > ofe/|||. By LemmaEJ 
this inequality need only be verified for the configurations in the support of ab\\cde\Af . Whatever 
the initial configuration is, a and b are in the cache. Simply eject the other page. Thus, cost = 1. 
A<& = — 1, and we are done. 
Case Actions I5iiil I7ivt . Let 

u; D f = ±(fab\\\ + fac\\\+fad\\\+fae\\\+fbc\\\ + fbd\\\+fbe\\\ + fcd\\\ + fde\\\), 

and let 

u p f = \{fa\\bc\ + fb\\ac\ + fc\\ab\ + /a[|de| + fb\\de\ + /c[|de|). 

We note: 

a\bcde\\Af = f\abcde\\ + 1 > u Df 
a\bc\de\Af = f\abc\de\ + l > uo Ff - \ 

By Lemma [21 these inequalities need only be verified for the configurations in the support of 
a\bcde\\Af and a\bc\de\Af, respectively. We thus have adjust > for I5iiil and adjust > | for I7ivl 

To compute cost, we give minimal transportations from the distribution of D a > b > c > d > e ^ respectively 
pa,b,c,d,e^ ^ Q ^ e weighted sum of distributions of the subsequents, for each of the two cases. For 
Action I5iiil whatever the initial configuration is, a is in the cache. Eject a with probability |, 
and eject each of the other two pages with probability g each. It is a routine verification that the 
required distribution is achieved. Thus, cost = 1. A<I> = — |, and we are done. 

For Action [7rv| the probability is ^ that the initial configuration is {a,b,c}. In this case, eject 
one of the three pages, each with probability |. Otherwise, the cache will contain a, and either 
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b or c but not both: eject a with probability |, and otherwise eject either b or c. It is a routine 
verification that the required distribution is achieved. Thus, cost = 1. A<£ = — |, and we are done. 

This completes the proof of all cases. □ 
Corollary 3 -K3 zs ^--competitive. 

5 Experimental Work and the Server Problem 

It is our hope that our technique will yield an order 2 knowledge state algorithm whose competi- 
tiveness is provably less than 2 for all metric spaces. 

We mention briefly progress by giving results for a class of is "one step up" in complexity from 
the class of uniform metric spaces. We consider the class of metric spaces Mi^, which consists of 
all metric spaces where every distance is either 1 or 2, and where the perimeter of every triangle 
is either 3 or 4. (The classic octahedral graph, which has six points, is a member of this class, as 
defined by Schlafli [32].) We have a computer generated order 2 knowledge state algorithm for the 2- 
server problem in this class: its competitiveness is |. We note that we also have calculated (through 
computer experimentation) the minimum value of C in the sense that no lower competitiveness for 
any order 2 knowledge state algorithm for M2 ; 4 can be proved using the methods described here. 
This value is C = ^t^~~- We briefly mention that there is an order 3 knowledge state algorithm 
for M2,4 which has, up to equivalence, only seven knowledge states, and is ^-competitive. We 
also can prove that no randomized online algorithm for the 2-server problem for M2 4 can achieve 
competitiveness less than j|. All knowledge states and probabilities in this order 3 algorithm can 
be described using only rational numbers. 

These results, as well as our results for the server problem in uniform spaces (equivalent to 
the caching problem), indicate a natural trade-off between competitiveness and memory of online 
randomized algorithms. 
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